Non-monotonic recursive polynomial expansions for linear scaling calculation of the 

density matrix 
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As it stands, density matrix purification is a powerful tool for linear scaling electronic structure 
calculations. The convergence is rapid and depends only weakly on the band gap. However, as will 
be shown in this paper, there is room for improvements. The key is to allow for non-monotonicity 
in the recursive polynomial expansion. Based on this idea, new purification schemes are proposed 
that require only half the number of matrix-matrix multiplications compared to previous schemes. 
The speedup is essentially independent of the location of the chemical potential and increases with 
decreasing band gap. 



During the last two decades, methods have been de- 
veloped that make it possible to apply ab initio elec- 
tronic structure calculations, using Hartree-Fock, Kohn- 
Sham density functional theory, or tight-binding models, 
to systems with many thousands of atoms. Although 
the computational cost of these methods increases only 
linearly with system size, such calculations are extremely 
demanding. Therefore, there is a need to improve exist- 
ing linear scaling methods in order to reduce the com- 
putational cost and make best use of modern computer 
resources. 

In linear scaling electronic structure calculations, effi- 
cient computation of the one-particle density matrix D 
for a given effective Hamiltonian F is an important ingre- 
dient. Many methods for linear scaling computation of 
the density matrix have been proposed. A common ap- 
proach is to employ a polynomial expansion of the func- 
tion D = 0{ijlI — F), where 9 is the Heaviside step func- 
tion and fj, is the chemical potential. The expansion may 
be built up serially by a Chebyshev serics^^^ or recur- 
sively by density matrix purification^'^^^^ or sign matrix 
methods. ^^'^^ Another approach is to minimize an energy 
functional with respect to the density matrix. -'^''^^^ 

For the isolated problem of computing the density ma- 
trix for a fixed Hamiltonian, the recursive density matrix 
purification schemes are highly efficient. The convergence 
is rapid and the computational cost scales as C'(ln(Ae/^)) 
where Ae is the spectral width of the effective Hamilto- 
nian matrix and ^ is the b and gap.^-'^'^-'^ This should be 
compared to an ©(-^/Ae/J) cost for the serial polynomial 
expansion^ and minimization^'^^ methods. However, de- 
spite the excellent performance of previously proposed 
density matrix purification schemes, substantial improve- 
ments are still possible as will be shown in this letter. 

In density matrix purification, the effective Hamilto- 
nian matrix is first shifted and scaled so that the eigen- 
values end up in the [0, 1] interval in reverse order. After 
that, low order polynomials with fixed points at and 1 
are recursively applied to build up the desired step func- 
tion. The general iterative procedure can be formulated 



as 



Xo = foiF) 



(1) 



where /o is the initial linear transformation and fi,i = 
1,2,... is a sequence of low order polynomials. 

Purification can either be carried out with fixed or 
varying chemical potential ^. In case of fixed-/i purifi- 
cation, a single polynomial with an unstable fixed point 
in ]0, 1[ is typically used for all fi,i > 0. The initial 
transformation /o maps the chemical potential to the un- 
stable fixed point. The purification process then brings 
the eigenvalues to their desired values of and 1. In 
case of varying-^ purification, the chemical potential is 
allowed to move during the iterations. This flexibility can 
be used to automatically adjust the expansion so that the 
correct number of electrons is obtained, as in canonical^" 
and trace-correcting"'^"'^ purification. 

In any case, the idea has been to use polynomials that 
increase monotonically in [0, 1] and have fixed points 
and vanishing derivatives at and 1. As discussed by 
Niklasson,^^ it can be understood that a recursive ex- 
pansion using such polynomials will converge towards a 
step function. In the following, we shall use the nota- 
tion Pij{x) for the polynomial of degree 1 + i + j with 
fixed points at and 1 and with i and j vanishing deriva- 
tives at and 1, respectively. Many previously proposed 
purification polynomials can be written in this form.^^ 

In this letter, we withdraw from the idea of using 
monotonically increasing purification polynomials. A 
scale and fold technique giving non-monotonic purifica- 
tion transformations is proposed that results in improved 
performance of both fixed- and varying-// purification 
schemes. The new idea is the following - before each 
iteration, the eigenspcctrum is stretched out outside the 
[0, 1] interval. Some of the polynomials of the form Pij 
can then be used to fold the eigenspectrum over itself. 



For example, the polynomial Pi.o(a;) 



can be used 



to fold the unoccupied part of the eigenspectrum if the 
eigenspectrum is stretched out below before its appli- 
cation. Similarly, the polynomial Po.i{x) = 2x — can 
be used to fold the occupied part. In general, the scale 
and fold technique can for a polynomial be used for 
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the unoccupied part if i is odd and for the occupied part 
if j is odd. 

Similar scahng techniques have previously been em- 
ployed to improve the convergence of Newton iterations 
for sign matrix evaluations. ^'^'^^ However, in this case the 
regular unsealed iteration keeps the eigenvalues outside 
the interval and the scaling is used to shrink rather than 
stretch out the eigenspectrum. 

We will first apply the scale and fold technique to fixed- 
/i purification using a polynomial Pm,m with m being 
odd. For such polynomials, the technique can be used 
to fold both the unoccupied and occupied parts of the 
eigenspectrum in each iteration. In this case the non- 
monotonic purification transformation 

= Prn,m{aiX^-l - 0.5) + 0.5/) (2) 

where a > 1 determines the amount of scaling around 
0.5. The complete algorithm for the special case m = 1 
is given in Algorithm 1, where Amin and Amax are the 
extremal eigenvalues of F or bounds thereof. For sim- 
plicity, it is assumed here that the band gap is located 
symmetrically around fi. The expression for a can be 
derived by solving 

Pm,Ml3 ~ 0.5) + 0.5) - P™,™(0.5(1 - a)) (3) 

for a > 1. Here, /3 is a parameter depending on the 
eigenvalue closest to 0.5, see Algorithm 1. The behavior 
of Algorithm 1 is illustrated in Figure 1. The behavior 
of the regular grand-canonical purification algorithm, 
corresponding to Algorithm 1 with a = 1, is shown for 
reference. Note how the scaled variant is able to take ad- 
vantage of the additional flexibility given by allowing for 
non-monotonicity, resulting in much faster convergence. 
Fixed-/Lt purification schemes with scaling can also be de- 
rived for other polynomials of the form Pij where i and 
j are both odd and larger than 0. Note that the scaling 
should be performed around the unstable fixed point of 
the polynomial which will differ from 0.5 if i 7^ j. 



Algorithm 1 McWeeny based fixed-/i purification 

Input: -F, Amin, Amax, Mi C 
1:7 = 2 max(Amax - /i, /i — Amin) 

2: Xa = + 0.51 

3: /3 = 0.5(1 - ^/7) 

4: for i — 1,2, ... ,n do 

5: a = 3/\/l2/32_18/?-h9 

6: Xs = a{X,-i ~ 0.51) +0.51 

7: X^ = 3Xf - 2X't 

8: Ps^a{P- 0.5) +0.5 

9: 13 = 3/3? - 
10: end for 
11: return D = X„ 



The scale and fold technique can also be used to- 
gether with varying-^ purification. We shall here focus 
on purification based on the polynomials Pg 1 and Pi^. 
These polynomials can be used to adjust the occupation 
count;-'^^ if the occupation is too high, the Pi^ polynomial 



Algorithm 2 Po,i & Pi,o based varying-/x purification 

Input; Amin 5 Amax 5 Alumoj Ahomo 

1: Xo = MF) = (Amax/ - F)/(Amax - Amin) 

'i: P = /o(Alumo) 

3: /? = /o(Aliomo) 

4: for i = 1,2, ... ,n do 
5: if /3 + /3 > 1 then 

6: Q = /3/(2-/3) 

7: X, = ((14- q)X,_i - q/)2 

8: /3 = ((l + a)/3-Q)2 

9: /3 = ((1 + q)/3-q)2 

10: else 

11: a = (l-^)/(l + ^) 

12: X,=2{l + a)X^-i-{l + OL)^Xti 

13: /3 = 2(l-|-Q)/3-(l + a)2/32 

14: P = 2{l + a)P-{l + a)^l3^ 

15: end if 

16: end for 

17: return D = X„ 



is applied, otherwise Po,i is applied. The scaling should 
in this case be chosen to stretch out the eigenspectrum 
below before application of and above 1 before ap- 
plication of 2x — . The purification transformations 
are 

= Pi,o((l + a)X,_i - al) (4) 

and 

/,(X,_i) =Po,i((l + a)^«-i) (5) 

where a > determines the amount of scaling. A com- 
plete algorithm is given in Algorithm 2, where Aiumo and 
Ahomo are the eigenvalues closest above and below the 
band gap, respectively. Without scaling, i.e. a = 0, this 
algorithm is essentially equivalent to the second order 
trace correcting purification scheme by Niklasson,^^ the 
only difference being how to choose polynomial in line 5 
of the algorithm. In the original work by Niklasson, the 
choice was based on the trace of the current density ma- 
trix approximation. Here, the polynomial is chosen based 
on the eigenvalues /3 and /3 that correspond to the low- 
est unoccupied and highest occupied molecular orbitals, 
respectively.^^ The behavior of Algorithm 2 is illustrated 
in Figure 2. The regular scheme with a = is shown for 
reference. 

Figures 1 and 2 show that the use of scaling results 
in more rapid convergence. In order to closer study the 
performance enhancement given by the scaling technique 
we shall consider diagonal test Hamiltonians with vary- 
ing chemical potential and band gap. As previously dis- 
cussed by Maziotti,-*^^ the results for a given chemical 
potential and a given band gap are valid for any Hamil- 
tonian with that band gap and chemical potential. 

Figure 3(a) shows that the proposed scaling tech- 
niques give significant speedup independently of the lo- 
cation of the chemical potential. As can be seen in Fig- 
ure 3(b), the cost of the scaled purification schemes scale 
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FIG. 1: Mapping of the eigenspectrum after 1, 3, and 5 iterations respectively of McWeeny based fixed-/i purification with and 
without use of scaling. In this illustrative example Ae/^ = 10 and the chemical potential n is located at Amin+0.25(Amax — Amin). 
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Eigenspectrum of X^^ ^ ^ Eigenspectrum of X^^ ^ ^ Eigenspectrum of X^^ 



(a) First iteration (b) Fifth iteration (c) Ninth iteration 



FIG. 2: Mapping of the eigenspectrum after 1, 5, and 9 iterations respectively of Po,i & Pi,o based varying-/! purification with 
and without use of scaling. In this illustrative example Ae/^ — 10 and the chemical potential fi is located at Amin + 0.25(Amax — 

Amin). 
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FIG. 3: Number of matrix-matrix multiplications needed to reach an accuracy of \\D — D\\2 < 10^^, where D is the computed 
approximation of the exact density matrix D. The test calculations presented in Panel (a) were performed on test Hamiltonians 
with band gaps ^ = 0.01 and varying chemical potential /j.. The test calculations presented in Panel (b) were performed on test 
Hamiltonians with chemical potentials ^ = 0.5 and varying band gap ^. In all cases, the spectral widths of the test Hamiltonians 
were Ae — 1. The test cases in Panel (a) are essentially equivalent to the test cases presented in Figure 2 of Ref. 11. 
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as 0(1/ In ^) with the band gap ^, just as for the reg- 
ular schemes. However, the convergence for the scaled 
schemes is around twice as fast as for the regular schemes. 

The scaling technique requires some information about 
the location of the band gap. More precisely, a lower 
bound of the lower edge and an upper bound of the upper 
edge of the band gap arc needed. It should be noted that 
incorrect bounds can lead to a mix-up between occupied 
and unoccupied states. However, even if the bounds arc 
not tight, the scaling technique can be used although the 
effect will not be as good as it could have been. Tight 
bounds can be obtained by some technique for calculation 
of interior eigenvalues. 

The performance was here measured by the number of 
matrix-matrix multiplications needed to reach a certain 
accuracy. In practical linear scaling calculations, efficient 
ways to bring about sparsity is critical for the perfor- 
mance. Since the proposed schemes are on the standard 
form given by (1), it is possible to combine them with 
previously suggested schemes for control of the forward 
error. As fewer iterations are needed, more aggressive 



truncation of small matrix elements can be used in each 
iteration. Therefore, we expect that the speedup given 
by the proposed techniques will be even better when the 
additional problem of bringing about sparsity is taken 
into account, although this is something that needs to be 
further investigated. 

In this letter, non-monotonic recursive polynomial ex- 
pansions for calculation of the density matrix were pro- 
posed. We have withdrawn from the idea that the ap- 
proximation of the step function should be monotonically 
increasing and show that this makes it possible to find 
new, more efficient non-monotonic purification transfor- 
mations. The scaled purification variants of this work 
represent a substantial improvement compared to pre- 
vious purification schemes. The reduction in computa- 
tional cost is essentially independent of the location of 
the chemical potential and the proposed schemes are par- 
ticularly efficient in case of small band gaps. 

Comments from Sara Zahedi and support from the 
Swedish Research Council under Grant No. 623-2009- 
803 are gratefully acknowledged. 
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